Redundant variables and Granger causality 
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: 

We discuss the use of multivariate Granger causality in presence of redundant variables: the 
application of the standard analysis, in this case, leads to under-estimation of causalities. Using the 
un-normalized version of the causality index, we quantitatively develop the notions of redundancy 
and synergy in the frame of causality and propose two approaches to group redundant variables: (i) 
for a given target, the remaining variables are grouped so as to maximize the total causality and 
(ii) the whole set of variables is partitioned to maximize the sum of the causalities between subsets. 
We show the application to a real neurological experiment, aiming to a deeper understanding of the 
• . physiological basis of abnormal neuronal oscillations in the migraine brain. The outcome by our 

approach reveals the change in the informational pattern due to repetitive transcranial magnetic 
' stimulations. 

PACS numbers: 05.45.Tp,87.19.L- 

Wiener Q] and Granger Q formalized the notion that if the prediction of one time series could be improved by 
incorporating the knowledge of past values of a second one, then the latter is said to have a causal influence on 
the former. Initially developed for econometric applications, Granger causality has gained popularity also among 
physicists (see, e.g., 0-01 )■ A kernel method for Granger causality, introduced in Q, deals with the nonlinear case by 
embedding data onto an Hilbert space, and searching for linear relations in that space. Geweke @ has generalized 
Granger causality to a multivariate fashion in order to identify conditional Granger causality; as described in [To| . 
multivariate causality may be used to infer the structure of dynamical networks [ll| from data. 
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Granger causality is connected to the information flow between variables [12J . Another important notion in informa- 
tion theory is the redundancy in a group of variables, formalized in [l3| as a generalization of the mutual information. 
A formalism to recognize redundant and synergetic variables in neuronal ensembles has been proposed in [14j and 
. £^ ' generalized in [151 ] ; the information theoretic treatments of groups of correlated degrees of freedom can reveal their 
. functional roles in complex systems. 

The purpose of this work is to show that the presence of redundant variables influences the performance by multi- 



variate Granger causality and to propose a novel approach to exploit redundancy so as to identify functional patterns 
in data. In the following we provide a quantitative definition to recognize redundancy and synergy in the frame of 
causality and show that the maximization of the total causality is connected to the detection of groups of redundant 
variables. 

Let us consider n time series {x a (t)} a= i^_. n [l6l ]; the state vectors are denoted 

X a (t) = (x a (t-m),...,x a (t- 1)), 

m being the window length (the choice of m can be done using the standard cross-validation scheme). Let e(x Q |X) 
be the mean squared error prediction of x a on the basis of all the vectors X (corresponding to linear regression or 
non linear regression by the kernel approach described in Q). The multivariate Granger causality index <5(/3 — > a) 
is defined as follows: consider the prediction of x a on the basis of all the variables but Xp and the prediction of x a 
using all the variables, then the causality is the (normalized) variation of the error in the two conditions, i.e. 
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Here we use the selection of significative eigenvalues described in Q to address the problem of over- fitting in ([T]). 
The straightforward generalization of Granger causality for sets of variables is 

f^A e{x a \X\B) 

where A and B are two disjoint subsets of {1, . . . , n}, and X \ B means the set of all variables except for those Xp 
with (3 e B. 

On the other hand, the un-normalized version of it, i.e. 

6 u (B^A)=J2{e(x«\X\B)-e(x a \X)}, (3) 

can be easily be shown to satisfy the following interesting property: if {Xp}/3(zB are statistically independent and 
their contributions in the model for A are additive, then 

5 U {B ^ A) = ^5 U {P A). (4) 

/3<£B 

In order to identify the informational character of a set of variables B, concerning the causal relationship B — t A, we 
remind that, in general, synergy occurs if B contributes to A with more information than the sum of all its variables, 
whilst redundancy corresponds to situations with the same information being shared by the variables in B. Following 
[l3l - [l5j ]. we make quantitative these notions and define the variables in B redundant if S U (B — > A) > YIb^b & W {P ~^ 
and synergetic if S U (B — >• A) < X!/3es & U {P —> A). In order to justify these definitions, firstly we observe that the case 
of independent variables (and additive contributions) does not fall in the redundancy case neither in the synergetic 
case, due to (j4|), as it should be. Moreover, we describe the following example for two variables X\ and X%. If X\ 
and X 2 are redundant, then removing X\ from the input variables of the regression model does not have a great 
effect, as X2 provides the same information as X\] this implies that 5 u (Xi — > A) is nearly zero. The same reasoning 
holds for X 2 , hence we expect that S u ({Xi : X 2 } — > A) > 8 U (X\ — > A) + 6 U (X 2 — > A). Conversely, let us suppose 
that X\ and X 2 are synergetic, i.e. they provide some information about A only when both the variables are used in 
the regression model; in this case 5 u ({Xi, X 2 } — > A), S u (Xi — > A) and S U (X 2 — > A) are almost equal and therefore 
S u ({Xi,X 2 } -> A) < S u (Xi -*A) + S U (X 2 A). 

Two analytically tractable cases are now reported as examples. Consider two stationary and Gaussian time series 
x{t) and y(t) with {x 2 (t)) = (y 2 (t)) = 1 and (x(t)y(t)) = C; they correspond, e.g., to the asymptotic regime of the 
autoregressive system 

x t +i = ax t + by t + cr^+'i ( 5 ) 
y t+ i = bx t + ay t + (rft+i) 

where ^ are i.i.d. unit variance Gaussian variables, C — 2ab/(l — a 2 — b 2 ) and a 2 = I — a 2 — b 2 — 2abC. Considering 
the time series z t +\ = A (x t + yt) + c'^t+i with a' = y/l — 2A 2 (1 + C), we obtain for m = 1: 

S u ({x,y} -»■ z) - 5 u (x -+z)- S u (y -> z) = A 2 (C + C 2 ). (6) 

Hence x and y are redundant (synergetic) for z if C is positive (negative). Turning to consider Wt+i — B x t • j/t + cr"^^ 
with a" = yl — B 2 (l + 2C) 2 , and using the polynomial kernel with p = 2, we have 

S u ({x, y}^z)~ S u (x ^z)~6 u (y^z) = B 2 {AC 2 - I); (7) 

x and y are synergetic (redundant) for w if \C\ < 0.5 (\C\ > 0.5). 

The presence of redundant variables leads to under-estimation of their causality when the standard multivariate 
approach is applied (this is not the case for synergetic variables). Redundant variables should be grouped to get a 
reliable measure of causality, and to characterize interactions in a more compact way. As it is clear from the discussion 
above, grouping redundant variables is connected to maximization of the un-normalized causality index ([3]) and, in 
the general setting, can be made as follows. For a given target ao, we call B the set of the remaining n — 1 variables. 
The partition {Ai} of J5, maximizing the total causality 
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consists of groups of redundant variables. Concerning the problem of finite sample size, we consider N samples from 
eqs. ([5]), with a — 0.5 and b = 0.4, and estimate casualities on these data. In figure (TTJ) we depict, as a function of 
N, the fraction / of times that the x and y are recognized as redundant for the variable z (with A — 0.4); a large 
amount of data is needed to assess significative causality and so to discover redundancy. The present approach can 
thus be used only in applications such that a large number of samples is available. 

Another example consists of a system of nine oscillators evolving according to noisy Kuramoto's equations [13]: 

9 

9 t =lo. 1 +KY j sin (e s -0 t ) + & (t) ; (8) 

3=1 

We consider three groups of oscillators, each made of three oscillators with the same natural frequency, respectively 
lu = 1,2,4; the noise strength is 0.01. Using the approach for circular variables, described in [Tj|, we find that the 
partition {A{\ of the nine oscillators, maximizing the sum of the causalities between every pair of subsets 

is {1, 2, 3}{4, 5, 6}{7, 8, 9}, corresponding to oscillators with the same natural frequency belonging to the same subset. 
In figure (J5|) we depict the optimal T and the value of T corresponding to the partition where each oscillator constitutes 
a set, versus the coupling K . It is clear that the maximization of T reveals the structure of the system in this example. 

Now we turn to consider a real application, i.e. EEG data from nineteen subjects suffering from migraine, under 
steady state flash stimuli (9 Hz) and repetitive transcranial magnetic stimulation (rTMS), a noninvasive method to 
excite neurons in the brain 19]. Migraine is a complex disorder of neurovascular origin whose pathophysiological 
basis is largely unknown. An altered cortical excitability may activate the trigemino-vascular system, but the ques- 
tion about a basal hypo or hyper cortical excitability is actually a matter of debate [2(| ■ In a previous work [2lJ 
anomalous cortical synchronization in migraneurs under flash stimuli has been reported. A better understanding of 
migraine pathophysiology may improve its therapeutical approach: in this view, studies employing neurophysiological 
techniques, possibly supported by advanced methods of quantitative analysis, may give an aid to the knowledge of 
migraine pathophysiology (22j . An important feature of migraine brain, is the tendency to hypersynchronization 
of alpha rhythms, which is influenced by anti-epileptic drugs 23]. rTMS induces a cortical modulation that lasts 
beyond the time of stimulation 24]: its effects depend on the frequency of stimulations. In order to understand 
the physiological basis of abnormal neuronal oscillations in migraine brain, we apply 1 Hz rTMS over the occipital 
cortex, before performing repetitive flash stimulation. The records are 12 seconds long, sampled at 256 Hz: this EEG 
duration is representative of the pattern of brain responsiveness to light stimuli, as previously shown [2l| . The signals 
are measured on seven channels (Fz,P3,P4,Cz,01,Oz,02) in three conditions basal (only flash stimuli) sham (placebo, 
i.e. flash stimuli and a fake magnetic stimulator) and rTMS (flash stimuli and magnetic stimulations) . As in the 
example above, for each target channel we exhaustively search for the partition of the remaining six channels which 
leads to the highest total causality A (averaged over the nineteen patients). In basal and sham conditions, we find 
that, for each target channel, the optimal partition is always a single set containing all the six remaining channels, in 
other words all the channels are redundant in these conditions. In presence of rTMS the causality pattern becomes 
more complex, and not all sets of variables are redundant w.r.t. the prediction of the others. All the six remaining 
channels are redundant for targets Fz,P4,01,Oz; for the other channels the best partitions are 

{Fz,Pi,OX,Oz,02}{P3} -> Cz 

{Fz,P4,OX,Oz,02}{Cz} -> P3 (9) 
{Fz,P4:,01,Oz}{CzP3} -> 02. 

These relations suggest the presence of a new source of information, due to magnetic stimulations, corresponding to 
Cz and P3 channels. We also search for the partition of the seven channel maximizing the total causality between 
groups (r), averaged over the patients. We find that the best partition is {Fz, P4, OX, 02}{CzP30z} for basal and 
sham conditions. For the TMS condition, instead, the best partition is {Fz, P4,Oz}{CzP3}{01}{02}; this result is 
consistent with the previous analysis as the channels Cz and P3 are grouped, see figure ([3|). 

The change of the informational pattern, induced by occipital cortex inhibition, may confirm that neuronal oscil- 
lations are related to the state cortical excitability. Presently, we have no explanation about the significance of the 
specific Cz-P3 group related to rTMS effect, but we can assert that oscillations in migraine brain vary as a function 
of cortical excitability. The reliability of this pattern in migraine needs to be matched with a control group, so as to 
better understand the peculiar reactivity of migraine brain and to find the optimal way to influence it. Some remarks 
are in order. Averaging over patients is mandatory to reduce the effects due to the variability among subjects. Our 
results are obtained using the linear kernel and m = 1, but the same partitions are obtained using the quadratic kernel 
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and to = 2 (application of cross-validation, on these data, suggests a low value of the order m; therefore we restrict 
our analysis to to = 1, 2). We find, in this real application, that the optimal partition maximizing the total causality 
is unique in all cases. It may happen, in other instances, that several partitions have the same total causality: in 
those cases prior information should be used to select one of the degenerate partitions. 

Summarizing, in this work we have quantitatively developed the notions of redundancy and synergy in the frame 
of causality. We have proposed to generalize the standard multivariate Granger method in presence of redundant 
variables, by using the causality index without normalization, and analyzing the system as follows: (i) for a given 
target, the remaining variables are grouped so as to maximize the total causality and (ii) the whole set of variables 
is partitioned to maximize the sum of the causalities between groups. Analyzing real data from a neurophysiological 
experiments, the proposed approach was able to detect the informational pattern induced by magnetic stimulations. 
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FIG. 2: Concerning the system of nine oscillators described in the next, we depict the sum of the causalities between every pair of 
subsets F (see the text) corresponding to the partitions {1, 2, 3}{4, 5, 6}{7, 8, 9} (empty circles) and {1}{2}{3}{5}{6}{7}{8}{9} 
(stars). Causalities are estimated over 5000 samples for each value of K. 
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FIG. 3: The partitions of electrodes maximizing F (see the text). 
Right: the optimal partition in presence of TMS. 



Left: the optimal partition for Basal and Sham conditions. 



